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ABSTRACT 

This paper focuses on classification of motor imagery in Brain Computer 
Interface (BCI) by using classifiers from machine learning technique. The 
BCI system consists of two main steps which are feature extraction and 
classification. The Fast Fourier Transform (FFT) features is extracted from 
the electroencephalography (EEG) signals to transform the signals into 
frequency domain. Due to the high dimensionality of data resulting from the 
feature extraction stage, the Linear Discriminant Analysis (LDA) is used to 
minimize the number of dimension by finding the feature subspace that 
optimizes class separability. Five classifiers: Support Vector Machine 
(SVM), K-Nearest Neighbors (KNN), Naive Bayes, Decision Tree and 
Logistic Regression are used in the study. The performance was tested by 
using Dataset 1 from BCI Competition IV which consists of imaginary hand 
and foot movement EEG data. As a result, SVM, Logistic Regression and 
Naive Bayes classifier achieved the highest accuracy with 89.09% in 
AUC measurement. 
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1. INTRODUCTION 

Brain-computer Interface (BCI) is direct communication pathway between the brain and computer 
or other external equipment that can translate brain activities into commands [1-3] as shown in Figure 1. BCI 
has potential to become more popular in many application of various fields, for example, medical, 
entertainment and military [4]. There are many techniques to get the signals from the brain, such as 
electroencephalogram (EEG), Magnetic resonance imaging (MRI) and computer tomography (CT). 



Figure 1. BCI system 


Journal homepage: http://beei.org/index.php/EEI 
















270 n 


ISSN: 2302-9285 


EEG is one of the most commonly used in BCI to records electrical activity and brain waves by 
placing the electrode on the scalp. EEG capable in capturing brain information processing quickly with high 
temporal resolution, but it has low spatial resolution and high noise level which make it challenging to extract 
useful information from EEG signals for BCI application [5]. 

Motor imagery is a popular paradigm in EEG-based BCI system [6]. The classification of the 
imaginary movement such as hand and foot movements is included in this paradigm [7]. Machine learning 
technique is commonly used in this classification process as it has the ability to model high-dimensional 
datasets [8]. Machine learning is the technique which can be briefly defined as enabling computers make 
successful predictions using past experiences [9]. 

In machine learning, there are various algorithms for classification process such as Support Vector 
Machine (SVM), K-Nearest Neighbors (KNN), Naive Bayes, Decision Tree, and Logistic Regression. SVM 
is one of the algorithm that usually used for motor imagery classification in a BCI system [3, 6], [10-13]. In 
[14], the classification of motor imagery is done by comparing the signal using the KNN classifier. Besides 
that, KNN also used in automated seizure detection [15] to classify EEG signals either seizure or non-seizure. 
Naive Bayes approach is used in [16, 17] to classify motor imagery and lower limb movement respectively. 
In [18], Decision Tree is used to classify the brain signals from imaginary tasks to open and close the hand 
for holding a ball. Logistic regression is used in [19] to classify the left and right motor imagery signals. 

The successful deployment of a BCI system is depend on the effectiveness of signal processing to 
classify the desired signals. Therefore, the aim of this paper is to study the performance of various 
classification algorithms in machine learning technique which can be used to classify motor imaginary task. 
In this paper, we study five machine learning algorithm: SVM, KNN, Naive Bayes, Decision Tree and 
Logistic Regression. The results helps in choosing better algorithm that makes a good classification 
performance in motor imagery classification. The rest of this paper is organized as follows: Section II gives 
the description of the dataset used. Section III contains the methodology of research. In section IV, the results 
is displayed with the discussion. Lastly, section V is conclusion. 


2. DATA DESCRIPTION 

The dataset used is Dataset 1 [20] from BCI Competition IV provided by B.Blankertz, C. Vidaurre 
and K.-R.Muller from Berlin (Germany). Motor imagery was performed by four healthy participants served 
as experimental subjects (a, b, f and g) [21]. Two mental tasks out of three tasks which are right hand 
movements, left hand movements or foot movements (side chosen by subject and can be both feet) is 
performed by the subjects in this experiment. The mental task performed by each subject is shown in Table 1. 

The experiment have two sessions which are calibration and evaluation session to record the training 
and test data respectively. Training data were provided with complete marker information which shown 
where the mental task is performed as it could be used for adapting the parameters of the methods or models 
while test data only consisted of the EEG signals, without any marker. Therefore, in this paper we use the 
data from calibration session only. 

In the first two runs, all subjects was asked to perform a certain mental task based on arrows 
pointing left, right, or down that are presented as visual cues on a computer screen. The cues were displayed 
for a period of 4s which interleaved with 2s of blank screen and 2s with a fixation cross shown in the center 
of the screen. Each subject have a total of 200 trials, as in each run 50 trials of each of the chosen two classes 
have been presented as shown in Table 2. A break of 15s was given for relaxation after every 15 trials and 
there were longer breaks of 5-15min between the runs. 


Table 1. Mental task performed for eac h subject Table 2. Total of trials for each of the chose n class 


Subjects 

Movements 

Subjects 

Movements 

1 st run 

2 nd run 

a 

Left hand, Right foot 

a,f 

Left hand 

50 

50 

b 

Right hand, Left hand 


Right hand 

50 

50 

f 

Left hand, Right foot 

Kg 

Left hand 

50 

50 

g 

Right hand, Left hand 


Right hand 

50 

50 


3. RESEARCH METHOD 

3.1. Feature extraction based on FFT-LDA 

The aim of feature extraction process is to extract the desired signals from the raw EEG signals and 
eliminates the unwanted signals such as background noise. The EEG signals is divided based on frequency 
bands which are gamma, beta, alpha, theta, and delta band. The frequency of motor imagery is usually lies in 
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alpha or beta band. Thus, the EEG signal processing should be process in term of frequency. The raw EEG 
signals which is in time domain, need to convert to frequency domain for better extraction. 

The Fast Fourier Transform (FFT) is applied in order to transforms the signal from the time domain 
to frequency domain. In FFT, the raw EEG signals is compared to sine waves consisting certain frequencies 
and the matching score is calculated. The result of matching score is depend on the similarity between the 
signals to the sine wave. 

Einear Discriminant Analysis (LDA) is then applied to FFT features to increase the computational 
efficiency by reduce the number of dimension in a dataset. Therefore, the degree of over-fitting can be 
reduced and the class separability is optimized by finding the feature subspaces [22]. The features extracted 
from FFT-FDA are then feed into several classifiers for classification process. Figure 2 shows the block 
diagram of the feature extraction process for better understanding. 



Figure 2. Block diagram of feature extraction process 


3.2. Classification 

Classification process is the technique to identify the class of the samples in the dataset. In this 
paper, the classifier is used to identify the type of movements such as left hand, right hand and right foot as 
shown in Figure 3. Five classifiers are used in this paper as follows: 



Figure 3. Block diagram of classification process 


3.2.1. Support Vector Machine (SVM) 

Support Vector Machine (SVM) is a classifier based on Statistical Teaming Theory [23] which its 
algorithm is used to find decision boundary between the class samples, which correctly separates the samples 
into the classes. SVM can effectively prevent the defects of traditional classification methods, such as over 
learning, dimension disaster and local minima [3]. This classifier aims to maximize the distance between 
decision boundary and support vectors which is called as margin [22]. The samples is separates according to 
their class by SVM algorithm as this algorithm find the decision boundary between classes and maximize 
the margin. 

3.2.2. K-Nearest Neighbors (KNN) 

K-Nearest Neighbors (KNN) is a non-parametric learning algorithm [24] which capable to 
characterize EEG data as it is a suitable for noisy and large data and its result is depend on the value of k and 
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distance metric used [23]. The new data point is classified by using the KNN algorithm with locates the k 
samples that are nearest to it based on the distance metric used and its class label is calculated depends on the 
class label of its k nearest neighbors. 

The concept of KNN algorithm is illustrated in Figure 5. by using one of its distance metric. The 
symbol ‘X’ at point (0.6, 0.45) in the figure is shows the new data to be classified and the radius with the dot 
line is the result of applying KNN algorithm with k=9 using Euclidean distance. In this case, there are two 
possible classes: circle class and triangle class. The KNN will classify the new data to the triangle class as the 
triangle class has the highest number of samples within the radius [7]. 
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Figure 5. The KNN classification (euclidean distance with k=9) [7] 


3.2.3. Naive Bayes 

Bayes’ theorem is used for this classifier [25] which calculates a set of probabilities by counting the 
frequency and combinations of values in a given dataset [26]. This algorithm is defined as; 




<D 


where P (C | X) denotes posterior probability which is the probability of class C given the data X, P(C) 
denotes class prior probability which means the probability of class C for the data X being true, P (X) denotes 
predictor prior probability which is the probability of the data regardless its class and P (XIC) denotes 
likelihood which is the probability of the data X given that the class C was true. 


3.2.4. Decision tree 

Decision Tree classifier breaks down a dataset by splitting the dataset into two or more. This 
classifier works like a tree with root node, decision nodes and terminal nodes. The root node represents the 
entire dataset that will be split into two or more branch/sub-tree. The decision node is a node that will split 
into another sub-nodes and the terminal node which is also called as leaf node is represents a decision or 
classification that will not split further. 


3.2.5. Logistic regression 

Logistic Regression predicts the class of the data/sample by fitting data to the logistic function 
(inverse of logit function). The logit function takes input values in the range 0 and 1 and transforms them to 
values over the entire real number range, which we can use to express a linear relationship between feature 
values and the log-odds [22]: 


logit(P(y = l|x)) = w 0 x 0 + w ± x ± + ••• + w m x m = = w T x (2) 

where P(y=l | x) is the conditional probability that are particular sample is belongs to class 1 given its feature 
x. The inverse of logit function is called as logistic function, defined as; 


00 ) = 


i 

1 + e~ z 


(3) 
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Where z is the net input, the linear combination of weights and sample features can be calculated as; 


z = w T x = w 0 4- w 1 x 1 + —h w m x m 


(4) 


4. RESULTS AND ANALYSIS 

The classification of imaginary movements for each subject in this paper involves two classes: right 
hand vs left hand or left hand vs right foot. In this section, the classification result of imaginary movements 
by applying various classifiers in machine learning is tabulated and illustrated in tables and graphs 
respectively. The 10-fold cross validation technique is used to verify the results. The performance of 
classifiers is measure using accuracy and AUC (area under the curve). AUC used the concept of sensitivity 
and specificity which linked to true/false positives indices. Therefore it gives meaningful data compared 
to accuracy [27]. 

The result of the classifiers of all subjects is given in Table 3. It was observed that for classification 
between left hand and right foot using accuracy measurement, SVM give the highest result with 77.73% for 
the subject a and 71.18% for subject f. In AUC measurement, SVM, Logistic Regression and Naive Bayes 
give the highest result with 86.16% for the subject a and 80.02% for subject f. For classification between 
right hand and left hand using accuracy measurement, Logistic Regression and Naive Bayes give the highest 
accuracy, subject b with 66.34% and subject g with 79.23% respectively. In AUC measurement, SVM, 
Logistic Regression, Naive Bayes give the highest accuracy with 72.20% for subject b and 89.09% for 
subject g. Figure 8 and Figure 9 shows the graph of comparison between accuracy and AUC measurement for 
left hand vs right foot and right hand vs left hand respectively. 


Classifier 


Table 3. Result for subject a, f, b and g 


Left hand & Right foot Left hand & Right hand 
a _f_b_ g 

Accuracy AUC Accuracy AUC Accuracy AUC Accuracy AUC 


_ (%) (%) (%) (%) (%) (%) (%) (%) 

SVM 77.73 86.16 71.18 80.02 66.01 72.20 78.89 89.09 

k-NN 76.73 85.24 69.50 78.08 60.98 67.71 76.23 87.35 

Logistic Regression 77.40 86.16 71.01 80.02 66.34 72.20 78.73 89.09 

Decision Tree 77.40 84.60 67.82 75.94 64.01 69.08 76.21 87.48 

Naive Bayes 77.40 86.16 71.01 80.02 65.68 72.20 79.23 89.09 


Motor Imaginary Left Hand and Right foot 
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Figure 8. Accuracy and AUC measurement for motor imaginary left hand and right foot 



Figure 9. Accuracy and AUC measurement for motor imaginary right hand and left hand 
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5. CONCLUSION 

This paper focuses on the classification of the EEG signal for imaginary movements to analyze the 
performance of classifiers from the machine learning technique. Feature extraction of the signals is done by 
using FFT-LDA technique. There is five classifier are used to classify the data which are SVM, k-NN, 
Logistic Regression, Decision Tree and Naive Bayes. The AUC measurement gives a better result compared 
to accuracy measurement. Best results were obtained using SVM, Logistic Regression and Naive Bayes 
classifier with 89.09% by using AUC measurement. In future work, we will investigate how to improve this 
algorithm in getting a better result than existing results for EEG classification. 
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